Generic Inverted Index on the GPU

نویسندگان

  • Jingbo Zhou
  • Qi Guo
  • H. V. Jagadish
  • Wenhao Luan
  • Anthony K. H. Tung
  • Yueji Yang
  • Yuxin Zheng
چکیده

Data variety, as one of the three Vs of the Big Data, is man-ifested by a growing number of complex data types such asdocuments, sequences, trees, graphs and high dimensionalvectors. To perform similarity search on these data, exist-ing works mainly choose to create customized indexes fordifferent data types. Due to the diversity of customized in-dexes, it is hard to devise a general parallelization strategyto speed up the search. In this paper, we propose a genericinverted index on the GPU (called GENIE), which can sup-port similarity search of multiple queries on various datatypes. GENIE can effectively support the approximate near-est neighbor search in different similarity measures throughexerting Locality Sensitive Hashing schemes, as well as sim-ilarity search on original data such as short document dataand relational data. Extensive experiments on different real-life datasets demonstrate the efficiency and effectiveness ofour system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generic Inverted Index Framework for Similarity Search on the GPU - Technical Report

Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general paralleliz...

متن کامل

Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines – lists intersection and inde...

متن کامل

A GPU-based parallel algorithm for time series pattern mining

Mining of time series pattern is an important research area, of which getting LCSS(Longest Common Subsequence) between high-dimensional time series is one of the most important issues. Large scale data needs to be handled in practical applications, so the research of efficient retrieval method is becoming a realistic work. Based on the issues above, we propose an efficient parallel algorithm to...

متن کامل

Comparison between BMI and Inverted BMI in Evaluating Metabolic Risk and Body Composition in Iranian Children

Objectives: To compare BMI and inverted BMI in evaluating body measurement, resting blood pressure, Dual energy X-ray absorptiometry (DEXA) parameters of fat mass and metabolic risk factors in Iranian children Materials and Methods: This is a cross-sectional study on 477 children aged 9-18 yearsin the South of Iran. Weight, height, resting blood pressure, waist and hip circumference and puberta...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1603.08390  شماره 

صفحات  -

تاریخ انتشار 2015